Goto

Collaborating Authors

 software maintenance


Cataloguing Hugging Face Models to Software Engineering Activities: Automation and Findings

González, Alexandra, Franch, Xavier, Lo, David, Martínez-Fernández, Silverio

arXiv.org Artificial Intelligence

Context: Open-source Pre-Trained Models (PTMs) provide extensive resources for various Machine Learning (ML) tasks, yet these resources lack a classification tailored to Software Engineering (SE) needs to support the reliable identification and reuse of models for SE. Objective: To address this gap, we derive a taxonomy encompassing 147 SE tasks and apply an SE-oriented classification to PTMs in a popular open-source ML repository, Hugging Face (HF). Method: Our repository mining study followed a five-phase pipeline: (i) identification SE tasks from the literature; (ii) collection of PTM data from the HF API, including model card descriptions and metadata, and the abstracts of the associated arXiv papers; (iii) text processing to ensure consistency; (iv) a two-phase validation of SE relevance, involving humans and LLM assistance, supported by five pilot studies with human annotators and a generalization test; (v) and data analysis. This process yielded a curated catalogue of 2,205 SE PTMs. Results: We find that most SE PTMs target code generation and coding, emphasizing implementation over early or late development stages. In terms of ML tasks, text generation dominates within SE PTMs. Notably, the number of SE PTMs has increased markedly since 2023 Q2, while evaluation remains limited: only 9.6% report benchmark results, mostly scoring below 50%. Conclusions: Our catalogue reveals documentation and transparency gaps, highlights imbalances across SDLC phases, and provides a foundation for automated SE scenarios, such as the sampling and selection of suitable PTMs.


LinkAnchor: An Autonomous LLM-Based Agent for Issue-to-Commit Link Recovery

Akhavan, Arshia, Hosseinpour, Alireza, Heydarnoori, Abbas, Keshani, Mehdi

arXiv.org Artificial Intelligence

--Issue-to-commit link recovery plays an important role in software traceability and improves project management. However, it remains a challenging task. A study on GitHub shows that only 42.2% of the issues are correctly linked to their commits. This highlights the potential for further development and research in this area. Existing studies have employed various AI/ML-based approaches, and with the recent development of large language models, researchers have leveraged LLMs to tackle this problem. These approaches suffer from two main issues. First, LLMs are constrained by limited context windows and cannot ingest all of the available data sources, such as long commit histories, extensive issue comments, and large code repositories. Second, most methods operate on individual issue-commit pairs; That is, given a single issue-commit pair, they determine whether the commit resolves the issue. This quickly becomes impractical in real-world repositories containing tens of thousands of commits. T o address these limitations, we present LinkAnchor, the first autonomous LLM-based agent designed for issue-to-commit link recovery. The lazy-access architecture of LinkAnchor enables the underlying LLM to access the rich context of software, spanning commits, issue comments, and code files, without exceeding the token limit by dynamically retrieving only the most relevant contextual data. Additionally, LinkAnchor is able to automatically pinpoint the target commit rather than exhaustively scoring every possible candidate. Our evaluations show that LinkAnchor outperforms state-of-the-art issue-to-commit link recovery approaches by 60-262% in Hit@1 score across all our case study projects. We also publicly release LinkAnchor [1] as a ready-to-use tool, along with our replication package. LinkAnchor is designed and tested for GitHub and Jira, and is easily extendable to other platforms. Trace link recovery (TLR) is the process of identifying and establishing connections between related software artifacts, such as requirements, code, tests, and documentation.


SCALAR: A Part-of-speech Tagger for Identifiers

Newman, Christian D., Scholten, Brandon, Testa, Sophia, Behler, Joshua A. C., Banabilah, Syreen, Collard, Michael L., Decker, Michael J., Mkaouer, Mohamed Wiem, Zampieri, Marcos, AlOmar, Eman Abdullah, Alsuhaibani, Reem, Peruma, Anthony, Maletic, Jonathan I.

arXiv.org Artificial Intelligence

--The paper presents the Source Code Analysis and Lexical Annotation Runtime (SCALAR), a tool specialized for mapping (annotating) source code identifier names to their corresponding part-of-speech tag sequence (grammar pattern). SCALAR's internal model is trained using scikit-learn's GradientBoostingClassifier in conjunction with a manually-curated oracle of identifier names and their grammar patterns. This specializes the tagger to recognize the unique structure of the natural language used by developers to create all types of identifiers (e.g., function names, variable names etc.). SCALAR's output is compared with a previous version of the tagger, as well as a modern off-the-shelf part-of-speech tagger to show how it improves upon other taggers' output for annotating identifiers. The code is available on Github 1 Index T erms --Program comprehension, identifier naming, part-of-speech tagging, natural language processing, software maintenance, software evolution I. I NTRODUCTION The identifiers developers create represent a significant amount of the information other developers must use to understand related code. Given that identifiers represent, on average, 70% of the characters in a code base [1], and developers spend more time reading code than writing [2], [3], it is important for researchers to better understand of how identifiers convey information, and how they can be improved to increase developer reading efficiency.


CoDocBench: A Dataset for Code-Documentation Alignment in Software Maintenance

Pai, Kunal, Devanbu, Premkumar, Ahmed, Toufique

arXiv.org Artificial Intelligence

One of the central tasks in software maintenance is being able to understand and develop code changes. Thus, given a natural language description of the desired new operation of a function, an agent (human or AI) might be asked to generate the set of edits to that function to implement the desired new operation; likewise, given a set of edits to a function, an agent might be asked to generate a changed description, of that function's new workings. Thus, there is an incentive to train a neural model for change-related tasks. Motivated by this, we offer a new, "natural", large dataset of coupled changes to code and documentation mined from actual high-quality GitHub projects, where each sample represents a single commit where the code and the associated docstring were changed together. We present the methodology for gathering the dataset, and some sample, challenging (but realistic) tasks where our dataset provides opportunities for both learning and evaluation. We find that current models (specifically Llama-3.1 405B, Mixtral 8$\times$22B) do find these maintenance-related tasks challenging.


Towards a Classification of Open-Source ML Models and Datasets for Software Engineering

González, Alexandra, Franch, Xavier, Lo, David, Martínez-Fernández, Silverio

arXiv.org Artificial Intelligence

Background: Open-Source Pre-Trained Models (PTMs) and datasets provide extensive resources for various Machine Learning (ML) tasks, yet these resources lack a classification tailored to Software Engineering (SE) needs. Aims: We apply an SE-oriented classification to PTMs and datasets on a popular open-source ML repository, Hugging Face (HF), and analyze the evolution of PTMs over time. Method: We conducted a repository mining study. We started with a systematically gathered database of PTMs and datasets from the HF API. Our selection was refined by analyzing model and dataset cards and metadata, such as tags, and confirming SE relevance using Gemini 1.5 Pro. All analyses are replicable, with a publicly accessible replication package. Results: The most common SE task among PTMs and datasets is code generation, with a primary focus on software development and limited attention to software management. Popular PTMs and datasets mainly target software development. Among ML tasks, text generation is the most common in SE PTMs and datasets. There has been a marked increase in PTMs for SE since 2023 Q2. Conclusions: This study underscores the need for broader task coverage to enhance the integration of ML within SE practices.


DeepCodeProbe: Towards Understanding What Models Trained on Code Learn

Majdinasab, Vahid, Nikanjam, Amin, Khomh, Foutse

arXiv.org Artificial Intelligence

Machine learning models trained on code and related artifacts offer valuable support for software maintenance but suffer from interpretability issues due to their complex internal variables. These concerns are particularly significant in safety-critical applications where the models' decision-making processes must be reliable. The specific features and representations learned by these models remain unclear, adding to the hesitancy in adopting them widely. To address these challenges, we introduce DeepCodeProbe, a probing approach that examines the syntax and representation learning abilities of ML models designed for software maintenance tasks. Our study applies DeepCodeProbe to state-of-the-art models for code clone detection, code summarization, and comment generation. Findings reveal that while small models capture abstract syntactic representations, their ability to fully grasp programming language syntax is limited. Increasing model capacity improves syntax learning but introduces trade-offs such as increased training time and overfitting. DeepCodeProbe also identifies specific code patterns the models learn from their training data. Additionally, we provide best practices for training models on code to enhance performance and interpretability, supported by an open-source replication package for broader application of DeepCodeProbe in interpreting other code-related models.


A Systematic Literature Review on the Use of Machine Learning in Software Engineering

Fred, Nyaga, Temkin, I. O.

arXiv.org Artificial Intelligence

Software engineering (SE) is a dynamic field that involves multiple phases all of which are necessary to develop sustainable software systems. Machine learning (ML), a branch of artificial intelligence (AI), has drawn a lot of attention in recent years thanks to its ability to analyze massive volumes of data and extract useful patterns from data. Several studies have focused on examining, categorising, and assessing the application of ML in SE processes. We conducted a literature review on primary studies to address this gap. The study was carried out following the objective and the research questions to explore the current state of the art in applying machine learning techniques in software engineering processes. The review identifies the key areas within software engineering where ML has been applied, including software quality assurance, software maintenance, software comprehension, and software documentation. It also highlights the specific ML techniques that have been leveraged in these domains, such as supervised learning, unsupervised learning, and deep learning.


Intelligent Software Tooling for Improving Software Development

Cooper, Nathan

arXiv.org Artificial Intelligence

Software has eaten the world with many of the necessities and quality of life services people use requiring software. Therefore, tools that improve the software development experience can have a significant impact on the world such as generating code and test cases, detecting bugs, question and answering, etc. The success of Deep Learning (DL) over the past decade has shown huge advancements in automation across many domains, including Software Development processes. One of the main reasons behind this success is the availability of large datasets such as open-source code available through GitHub or image datasets of mobile Graphical User Interfaces (GUIs) with RICO [112] and ReDRAW [267] to be trained on. Therefore, the central research question my dissertation explores is: In what ways can the software development process be improved through leveraging DL techniques on the vast amounts of unstructured software engineering artifacts? We coin the approaches that leverage DL to automate or augment various software development task as Intelligent Software Tools.


MaintainoMATE: A GitHub App for Intelligent Automation of Maintenance Activities

Nadeem, Anas, Sarwar, Muhammad Usman, Malik, Muhammad Zubair

arXiv.org Artificial Intelligence

Software development projects rely on issue tracking systems at the core of tracking maintenance tasks such as bug reports, and enhancement requests. Incoming issue-reports on these issue tracking systems must be managed in an effective manner. First, they must be labelled and then assigned to a particular developer with relevant expertise. This handling of issue-reports is critical and requires thorough scanning of the text entered in an issue-report making it a labor-intensive task. In this paper, we present a unified framework called MaintainoMATE, which is capable of automatically categorizing the issue-reports in their respective category and further assigning the issue-reports to a developer with relevant expertise. We use the Bidirectional Encoder Representations from Transformers (BERT), as an underlying model for MaintainoMATE to learn the contextual information for automatic issue-report labeling and assignment tasks. We deploy the framework used in this work as a GitHub application. We empirically evaluate our approach on GitHub issue-reports to show its capability of assigning labels to the issue-reports. We were able to achieve an F1-score close to 80\%, which is comparable to existing state-of-the-art results. Similarly, our initial evaluations show that we can assign relevant developers to the issue-reports with an F1 score of 54\%, which is a significant improvement over existing approaches. Our initial findings suggest that MaintainoMATE has the potential of improving software quality and reducing maintenance costs by accurately automating activities involved in the maintenance processes. Our future work would be directed towards improving the issue-assignment module.


DACOS-A Manually Annotated Dataset of Code Smells

Nandani, Himesh, Saad, Mootez, Sharma, Tushar

arXiv.org Artificial Intelligence

Researchers apply machine-learning techniques for code smell detection to counter the subjectivity of many code smells. Such approaches need a large, manually annotated dataset for training and benchmarking. Existing literature offers a few datasets; however, they are small in size and, more importantly, do not focus on the subjective code snippets. In this paper, we present DACOS, a manually annotated dataset containing 10,267 annotations for 5,192 code snippets. The dataset targets three kinds of code smells at different granularity: multifaceted abstraction, complex method, and long parameter list. The dataset is created in two phases. The first phase helps us identify the code snippets that are potentially subjective by determining the thresholds of metrics used to detect a smell. The second phase collects annotations for potentially subjective snippets. We also offer an extended dataset DACOSX that includes definitely benign and definitely smelly snippets by using the thresholds identified in the first phase. We have developed TagMan, a web application to help annotators view and mark the snippets one-by-one and record the provided annotations. We make the datasets and the web application accessible publicly. This dataset will help researchers working on smell detection techniques to build relevant and context-aware machine-learning models.